Search CORE

50 research outputs found

Faster Clustering via Preprocessing

Author: Kopelowitz Tsvi
Krauthgamer Robert
Publication venue
Publication date: 01/01/2012
Field of study

We examine the efficiency of clustering a set of points, when the encompassing metric space may be preprocessed in advance. In computational problems of this genre, there is a first stage of preprocessing, whose input is a collection of points

M

; the next stage receives as input a query set

Q\subset M

, and should report a clustering of

Q

according to some objective, such as 1-median, in which case the answer is a point

a\in M

minimizing

\sum_{q\in Q} d_M(a,q)

. We design fast algorithms that approximately solve such problems under standard clustering objectives like

p

-center and

p

-median, when the metric

M

has low doubling dimension. By leveraging the preprocessing stage, our algorithms achieve query time that is near-linear in the query size

n=|Q|

, and is (almost) independent of the total number of points

m=|M|

.Comment: 24 page

arXiv.org e-Print Archive

CiteSeerX

Color-Distance Oracles and Snippets

Author: Kopelowitz Tsvi
Krauthgamer Robert
Publication venue: LIPIcs - Leibniz International Proceedings in Informatics. 27th Annual Symposium on Combinatorial Pattern Matching (CPM 2016)
Publication date: 01/01/2016
Field of study

In the snippets problem we are interested in preprocessing a text T so that given two pattern queries P_1 and P_2, one can quickly locate the occurrences of the patterns in T that are the closest to each other. A closely related problem is that of constructing a color-distance oracle, where the goal is to preprocess a set of points from some metric space, in which every point is associated with a set of colors, so that given two colors one can quickly locate two points associated with those colors, that are as close as possible to each other. We introduce efficient data structures for both color-distance oracles and the snippets problem. Moreover, we prove conditional lower bounds for these problems from both the 3SUM conjecture and the Combinatorial Boolean Matrix Multiplication conjecture

Dagstuhl Research Online Publication Server

A Simple Algorithm for Approximating the Text-To-Pattern Hamming Distance

Author: Kopelowitz Tsvi
Porat Ely
Publication venue: OASIcs - OpenAccess Series in Informatics. 1st Symposium on Simplicity in Algorithms (SOSA 2018)
Publication date: 01/01/2018
Field of study

The algorithmic task of computing the Hamming distance between a given pattern of length m and each location in a text of length n, both over a general alphabet Sigma, is one of the most fundamental algorithmic tasks in string algorithms. The fastest known runtime for exact computation is tilde O(nsqrt m). We recently introduced a complicated randomized algorithm for obtaining a (1 +/- eps) approximation for each location in the text in O( (n/eps) log(1/eps) log n log m log |Sigma|) total time, breaking a barrier that stood for 22 years. In this paper, we introduce an elementary and simple randomized algorithm that takes O((n/eps) log n log m) time

Dagstuhl Research Online Publication Server

Answering Spatial Multiple-Set Intersection Queries Using 2-3 Cuckoo Hash-Filters

Author: Eppstein David
Hagerup Torben
Kopelowitz Tsvi
Yang Xiao
Publication venue
Publication date: 29/08/2017
Field of study

We show how to answer spatial multiple-set intersection queries in O(n(log w)/w + kt) expected time, where n is the total size of the t sets involved in the query, w is the number of bits in a memory word, k is the output size, and c is any fixed constant. This improves the asymptotic performance over previous solutions and is based on an interesting data structure, known as 2-3 cuckoo hash-filters. Our results apply in the word-RAM model (or practical RAM model), which allows for constant-time bit-parallel operations, such as bitwise AND, OR, NOT, and MSB (most-significant 1-bit), as exist in modern CPUs and GPUs. Our solutions apply to any multiple-set intersection queries in spatial data sets that can be reduced to one-dimensional range queries, such as spatial join queries for one-dimensional points or sets of points stored along space-filling curves, which are used in GIS applications.Comment: Full version of paper from 2017 ACM SIGSPATIAL International Conference on Advances in Geographic Information System

arXiv.org e-Print Archive

Crossref

Selection in the Presence of Memory Faults, with Applications to In-place Resilient Sorting

Author: Kopelowitz Tsvi
Talmon Nimrod
Publication venue
Publication date: 01/01/2012
Field of study

The selection problem, where one wishes to locate the

k^{th}

smallest element in an unsorted array of size

n

, is one of the basic problems studied in computer science. The main focus of this work is designing algorithms for solving the selection problem in the presence of memory faults. These can happen as the result of cosmic rays, alpha particles, or hardware failures. Specifically, the computational model assumed here is a faulty variant of the RAM model (abbreviated as FRAM), which was introduced by Finocchi and Italiano. In this model, the content of memory cells might get corrupted adversarially during the execution, and the algorithm is given an upper bound

\delta

on the number of corruptions that may occur. The main contribution of this work is a deterministic resilient selection algorithm with optimal O(n) worst-case running time. Interestingly, the running time does not depend on the number of faults, and the algorithm does not need to know

\delta

. The aforementioned resilient selection algorithm can be used to improve the complexity bounds for resilient

k

-d trees developed by Gieseke, Moruz and Vahrenhold. Specifically, the time complexity for constructing a

k

-d tree is improved from

O(n\log^2 n + \delta^2)

O(n \log n)

. Besides the deterministic algorithm, a randomized resilient selection algorithm is developed, which is simpler than the deterministic one, and has

O(n + \alpha)

expected time complexity and O(1) space complexity (i.e., is in-place). This algorithm is used to develop the first resilient sorting algorithm that is in-place and achieves optimal

O(n\log n + \alpha\delta)

expected running time.Comment: 26 page

arXiv.org e-Print Archive

CiteSeerX